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Abstract. With double hashing, for a key x, one generates two hash 
values f(x) and g{x), and then uses combinations {fix) + ig{x)) mod n 
for i = 0, 1, 2,... to generate multiple hash values in the range [0, n — 1] 
from the initial two. For balanced allocations, keys are hashed into a hash 
table where each bucket can hold multiple keys, and each key is placed in 
the least loaded of d choices. It has been shown previously that asymp¬ 
totically the performance of double hashing and fully random hashing is 
the same in the balanced allocation paradigm using fluid limit methods. 

Here we extend a coupling argument used by Lueker and Molodowitch to 
show that double hashing and ideal uniform hashing are asymptotically 
equivalent in the setting of open address hash tables to the balanced al¬ 
location setting, providing further insight into this phenomenon. We also 
discuss the potential for and bottlenecks limiting the use this approach 
for other multiple choice hashing schemes. 

1 Introduction 

An interesting result from the hashing literature shows that, for open addressing, 
double hashing has the same asymptotic performance as uniform hashing. We 
explain the result in more detail. In open addressing, we have a hash table with 
n cells into which we insert m keys; we use a = mjn to refer to the load factor. 
Each element is placed according to a probe sequence, which is a permutation of 
the cells. To place a key, we run through its probe sequence in order, and place the 
key in the first empty cell found. (Each cell can hold one key.) The term uniform 
hashing is used to refer to the idealized situation where the probe sequences are 
independent, uniform permutations. A key metric for such a scheme is the search 
time for an unsuccessful search, which is the number of probes until an empty 
cell is found. When the load is a, the expected number of probes is easily shown 
to be (n -I- l)/(n — an -I- 1) = (1 — a)~^ + 0{l/n). 

In contrast to uniform hashing, with double hashing, for a key x one generates 
two hash values f{x) and g{x), and then uses combinations {f{x) + ig{x)) mod n 
for i = 0,1, 2,... to generate the permutation on [0, n — 1]. Here we assume that 
n is prime, the hash f{x) is uniform over [0,n — 1], and g{x) is uniform [I, n — I]. 
It might appear that limiting the space of random choices with double hashing 
might significantly impact performance, but this is not the case; it has been 
shown that the search time for an unsuccessful search remains (1 — a)“^-|-o(l) 

[3|6|n]. 
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It is natural to ask whether similar results can be prove for other standard 
hashing schemes. For Bloom filters, Kirsch and Mitzenmacher [7], starting from 
the empirical analysis by Dillinger and Manolios [3], prove that using double 
hashing has asymptotically negligible effects on Bloom filter performance. (In¬ 
deed, several publicly available implementations of Bloom filters now use double 
hashing.) Bachrach and Porat use double hashing in a variant of min-wise inde¬ 
pendent sketches [5] . Mitzenmacher and Thaler show suggestive preliminary re¬ 
sults for double hashing for peeling algorithms and cuckoo hashing m- Leconte 
consideres double hashing in the context of the load threshold for cuckoo hashing, 
and shows that the thresholds are the same if one allows double hashing to fail 
to place o{n) keys [10]. Recently, Mitzenmacher has shown that double hashing 
asymptotically has no effect on the load distribution in the setting of balanced 
allocations m-, we describe this result further in the related work below. 

As a brief review, the standard balanced allocation paradigm works as fol¬ 
lows: suppose m balls (the keys) are sequentially placed into n bins (hash table 
buckets), where each ball is placed in the least loaded of d uniform indepen¬ 
dent choices of the bins. Typically we think of each of these d choices as being 
obtained from a random hash function; we therefore refer to this setting as us¬ 
ing random hashing. We use the standard balls and bins nomenclature for this 
setting (although one could correspondingly use keys and buckets.) In the case 
where the number of balls and bins are equal, that is m = n, the maximum 
load (that is, the maximum number of balls in a bin) is + 0(1), much 

lower than the (1 + o(l)) obtained where each ball is placed according 

to a single uniform choice [T]. Further, using a fluid limit model that yields a 
family of differential equations describing the balanced allocations process, one 
can determine, for any constant j, the asymptotic fraction of bins of load j as n 
goes to infinity, and Chernoff-type bounds hold that can bound the fraction of 
bins of load j for finite n [12] . (These results extend naturally when m = cn for 
a constant c; the maximum load remains -1-0(1).) 

For balanced allocations in conjunction with double hashing, the jth ball 
obtains two hash values, /(j) G [0, n — 1] and g{j) G [1, n — 1], chosen uniformly 
from these ranges. The d choices for the jth ball are then given by h(j, k) = 
(/(j) -I- kg{j)) mod n, k = 0,1,..., d — 1, and the ball is placed in the least 
loaded. For convenience in this paper we take n to be prime, but the results 
can be modified straighforwardly by having g(j) chosen relatively prime to n. In 
particular, if m is a power of 2, as is natural in practice, by having g(j) uniformly 
chosen from the odd numbers in [1, n — 1] we obtain analogous results. 

The purpose of this paper is to provide an alternative proof that double hash¬ 
ing has asymptotically negligible effects in the setting of balanced allocations. 
Specifically, we extend a coupling argument used by Lueker and Molodowitch to 
show that double hashing and ideal uniform hashing are asymptotically equiva¬ 
lent in the setting of open address hash tables to the balanced allocation setting. 
We refer to their argument henceforth as the LM argument. As far as we are 
aware, this is the first time this coupling approach has been used for a hashing 
scheme outside of open addressing. Adapting the LM argument gives new in- 
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sights into double hashing for balanced allocations, as well as to the potential 
for this approach to be used for other multiple choice hashing schemes. 

In particular, our modification of the LM argument involves significant changes. 
For reasons we explain, the LM argument does not seem to allow a direct coupling 
with random hashing; instead, we couple with an intermediary process, which is 
equivalent to random hashing plus some small bias that makes bins with heavy 
load slightly more likely. We then argue that this added bias does not affect 
the asymptotic performance of the balanced allocations process, providing the 
desired connection between the balanced allocation process with random hash¬ 
ing and double hashing. Specifically, for constant d, the maximum load remains 
*°fo^gd” + with high probability, and the asymptotic fraction of bins with 

constant load j can be determined using the method of differential equations. 

1.1 Related Work 

The balanced allocations paradigm, or the power of two choices, has been the 
subject of a great deal of work. See, for example, the survey articles mm for 
references and applications. 

The motivation for this paper stems from recent work showing that the 
asymptotic fraction of bins of each load j (for constant j) for double hashing 
can be determined using the same differential equations describing the behavior 
for random hashing m- Using insight from this approach also provides a proof 
that using double hashing, for a constant number of choices d, the maximum 
load is log log n/log d -I- 0(1) with high probability using double hashing. The 
latter result is obtained by modifying the layered induction approach of [I] for 
random hashing. Here we provide an alternative way of obtaining these results 
by a direct coupling with a slightly modified version of random hashing, based 
on the LM argument. The paper |13] also contains discussion of related work. 

Of course, our work is also highly motivated by the chain of work |3|6|ll | 19j 
regarding the classical question of the behavior of double hashing for open ad¬ 
dress hash tables, where empirical work had shown that the difference in per¬ 
formance, in terms of the average length of an unsuccessful search sequence, 
appeared negligible. Theoretically, the main result showed that for a table with 
n cells and an keys for a constant a, the number of probed locations in an 
unsuccessful search was (up to lower order terms) 1/(1 — a) for both double 
hashing and uniform hashing m- We have not seen this methodology applied 
to other hashing schemes such as balanced allocations, although of course the 
issue of limited randomness is pervasive; a recent example include studying the 
use of k-wise independent hash functions for linear probing for small constant k 

mM- 

2 Coupling Double Hashing and Random Hashing 

Before delving into our proof, it is worth describing the LM argument at a high 
level, as well as changes needed in the balanced allocation context. 

Consider the setting of open address hashing, where m! keys have been placed 
into a table of size n using uniform hashing. Suppose now we consider placing 
the next key using double hashing instead of random hashing. The LM argument 
shows that we can couple the decisions so that, with high probability (by which 
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we mean 1 — o(l)), the end result in terms of where the key lands is the same. 
Inductively, this means that if we start from an empty table, we can couple the 
two processes step by step, and as long as the coupling holds, the two tables will 
appear exactly the same. 

However, there is a problem. Let us suppose that we run the processes for 
m = an keys. While the two processes match up on any single step with high 
probability, this probability is not high enough (it is fi{l/n)) to guarantee that 
the two processes couple over all m insertions of balls with high probability. At 
some point, the two processes will very likely deviate, and we need to consider 
that deviation. 

In fact, the LM argument enforces that the deviation occur in a particular 
way. They show that the probability a key ends in any given position from 
double hashing is at most only I + (5 times the probability a key ends in any 
given position from uniform hashing for a 5 that is o(l). The coupling then 
places the key according to double hashing with probability 1/(1 + d) in both 
tables, and with probability (5/(1 + (5) it places the key to yield the appropriate 
distribution from uniform hashing. As a result, both tables follow the placement 
given by uniform hashing; hence, in the rare case where coupling fails, it fails in 
such a way that the double hashing process has obtained a key placed according 
to uniform hashing. 

When such a failure occurs, to the double hashing process, the key appears 
as a randomly placed extra key that has entered the system and that was not 
placed according to double hashing. The LM argument then makes uses of the 
following property: adding such an extra key only makes things worse, in that 
at the end of the double hashing process every hash cell occupied by a key if the 
extra key hadn’t been added will still be occupied. This is a form of domination 
that the LM argument requires. 

As 6 = o(l), the LM argument concludes by showing that if we run the 
coupled process for am + o{m) keys for a suitably chosen o{m), then at least am 
keys will be added in the double hashing process according to double hashing. 
That is, the number of extra keys added is asymptotically negligible, giving the 
desired result: double hashing is stochastically dominated by uniform hashing 
with an asymptotically negligible number of extra keys, which does not affect 
the high order 1/(1 — o:) term for an unsuccessful search. 

We attempt to make an analogous argument in the double hashing setting 
for balanced allocations. A problem arises in that it seems we cannot arrange 
for the coupling to satisfy the requirements of the original LM argument. As 
mentioned, in the open address setting, each position is only at most 1 + (5 times 
as likely to obtain a key (with high probability over the results of the previous 
steps). This fact is derived from Chernoff bounds that hold because each cell 
has a reasonable chance of being chosen; when there are m' cells filled, each 
cell is the next filled with probability approximately l/(n — m'). But this need 
not be the case in the balanced allocation setting. As an example, consider the 
dth most loaded bin; suppose for convenience it is the only bin with a given 
load. Using random hashing, the probability it receives a ball is as all 
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d choices have to be among the d most loaded bins. Using double hashing, the 
probability it receives a ball could be if the d most loaded bins are in an 

arithmetic progression that align with the double hashing. While in this example 
the probability the d choices align this way is rare, in general the probability 
that some bin is significantly more likely to obtain a ball when using double 
hashing does not appear readily swept into o(l) failure probabilities. 

However, intuitively, by concentration, this problem can only occur for bins 
that are rarely chosen under uniform hashing; that is, for bins with high load. 
We therefore can resolve the issue by not coupling double hashing with random 
hashing directly, but instead slightly perturbing the distribution from random 
hashing to give slightly more weight to heavily loaded bins, enough to cope with 
the relative looseness in the concentration bounds for rare events. We then show 
that this small modification to random hashing does not affect the characteristics 
of the final distribution of balls into bins that we have described above. 

3 Modified Random Hashing 

We start by defining the modified random hashing process that we couple with. 
Given a balls and bins configuration, we do the following to place the next ball: 

— with probability we place the ball uniformly at random; 

— with all remaining probability, we place the ball according to the least loaded 
of d choices (with ties broken randomly). 

We briefly note the following results regarding this modified random hashing. 


Lemma 1. Let i, d, and T be constants. Suppose m = Tn balls are sequentially 
thrown into n bins according to the modified random hashing process. Let XfiT) 
be the number of bins of load at least i after the balls are thrown. Let xfit) be 
determined by the family of differential equations 


where xoft) = 1 for all time and Xi{0) = 0 for i > 1. Then with probability 
l-o(l), 

X^iT) 

= Xi{T) + 0(1). 
n 

Lemma 2. Let d and T be constants. Suppose m = Tn balls are sequentially 
thrown into m bins according to the modified random hashing process. Then the 
maximum load is " + 0(1), where the 0(1) term depends on T. 

Both proofs follow readily from the known proofs of these statements under 
random hashing, with small changes to account for the modification. Intuitively, 
only = Tn^'^ balls are distributed randomly, which with high probability 

affects o(n) bins by at most 0(1) amounts. Hence, one would not expect the 
modification to the random hashing process to change the load distribution 
substantially. More details are given in the Appendix. 
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4 The Coupling Proof 

We now formalize the coupling proof. While we generally follow the description 
of Lueker and Molodowitch, our different setting naturally requires changes and 
some different terminology. 

Recall that we assume that there is a table of n bins, where n is a prime. 
We may use the term hash pair to refer to one of the n{n — 1) possible pairs of 
hash values {f{j),g{j))- We aim to consider the outcomes when m = cn balls 
are placed using double hashing. We refer to the bin state as the ordered list 
{bi,b 2 , ■ ■ ■ ,bn), where bi is the number of balls in the ith bin. For any bin state, 
for every bin z, let fj{z) be the number of hash pairs that would cause z to obtain 
the next ball. It follows that the probability r]{z) that z is the next bin to obtain 
a ball is f}{z)/{n{n — 1)). 

When considering modified random hashing with d choices, we assume the 
d choices are made without replacement. This choice does not matter, as it is 
known the difference in performance between choosing with and without replace¬ 
ment is negligible. Specifically, for constant d, the expected number of balls that 
would choose some bin more than once is constant, and is O(logn) with high 
probability; this does not affect the asymptotic behavior of the system. In our 
setting, since with double hashing the choices are without replacement, it makes 
the argument details somewhat easier. 

Similarly, we technically need to consider what to do if there is a tie for the 
least loaded bin. For convenience, in case of a tie we assume that the tie is broken 
randomly among the bins that share the least load, but in the following coupled 
fashion. At each step, we assume a ranking is given to the bins (according to a 
random permutation of [l,n]); the rankings at each step are independent and 
uniform. In case of tie in the load, the rank is used the break the tie. Note 
that, at each step, we then have a total ordering on the bins, where the order is 
determined first by the load and then by the rank. We refer to the jth ordered 
bin, with the following meaning; the first ordered bin is the heaviest loaded with 
lowest priority in tie-breaking, and the nth ordered bin is the least loaded with 
the highest priority in tie-breaking. Hence, with random hashing, the jth ordered 
bin obtains the next ball with probability 



The d/n term represents that the jth ordered bin must be one of the d choices; 
the other term represents that the remaiining d — 1 choices must be from the 
top j — I ordered elements. 

We extend the domination concept used in the LM argument in the nat¬ 
ural way. We say a bin state B = (&i, & 2 , • ■ •, &n) dominates a bin state A = 
(oi, 02 ,..., a„) if > Oi for all i. We may write B > A when B dominates A. 
The following is the key point regarding domination: 

Lemma 3. If B ^ A, and we insert a ball into a table B to obtain B' and the 
same ball into A to obtain A' by using the least loaded of d choices, then (whether 
we use double hashing, random hashing, or modified random hashing) B' A'. 
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Proof. Suppose the bin choices are ii,i 2 , ■ ■ ■ ,id- Without loss of generality let 
ii be the least loaded of these choices in bin state A (or the bin chosen by our 
tie-breaking scheme). If ii is not chosen as the least loaded in B, it must be 
because bi^ > and hence even after the ball is placed, B' P A'. ■ 


Our goal now is to show that if we have a table which has been filled up to that 
point by modified random hashing, we can couple appropriately. That is, we can 
couple by using the result of a double hashing step with high probability, and 
with some small probability we use a modified random hashing step, giving the 
double hashing process an extra ball. 

To begin, we note that with modified random hashing, the jth ordered bin 
obtains a ball with probability 


p, = (l-n-0-4) 




n 


(l-n-O-4) 



-I- n 


- 1.4 


In the right hand side of the first equality, the first term expresses the probability 
that there are d choices and that the ball chooses the jth order bin. The second 
term arises from the probability that the ball is placed randomly after choosing 
a single bin. 

We wish to show the following: 

Lemma 4. Suppose a bin z is jth in the ordering after starting with an empty 
table and adding n' balls by modified random hashing. Then 

V{z) <Pj + 

except with probability where this probability is over the random bin 

state obtained from the n' placed balls. 

We remark that the constants here were chosen for convenience and not 
optimized; this is sufficient for our asymptotic statements. 


Proof. If z is jth in the ordering, we have 

E[r?(z)] = ^7^ ; Hviz)] = d{n - I)y^. 

* U-ij U-ij 

Here we use the fact that, under modified random hashing, the ordering of the 
bins form a uniform permutation. Hence, in expectation, double hashing yields 
the same probability for a bin obtaining a ball as random hashing. However, we 
must still show that individual probabilities are close to their expectations. 

We hrst show that when E[?7(^)] is sufficiently large then ri{z) is close to its 
expectation, which is unsurprising. When E[? 7 (z)] is small, so that tail bounds 
are weaker, we are rescued by our modification to pj; the additional n~^'^ skew 
in the distribution that we have added for modified random hashing will our 
desired bound between ri(z) and pj. 
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In this case, we use martingale bounds; we use martingales instead of Chernoff 
bounds because there is dependence among the behavior of the d{n — 1) hash 
pairs that include bin z. 

We set up the martingale as follows. We refer to the bins as bins 1 to n. 
Without loss of generality let z be the last bin (labeled n) and and let Zi be the 
rank in the bin ordering of the ith bin, for i = 1 to n — 1. We expose the Zi one 
at a time to establish a Doob martingale [TOl Section 12.1]. Let 

F, = E[r}(z) I Zi,...,Z,]. 

Note Iq = E[?7(z)] and Yn-i = fj{z). We claim that 

To see this, note that changing our permutation of the ordering of the bins by 
switching the rank order of two bins a and b can only affect the hash pairs that 
include z and a or z and 5; there are fewer than d^ such sequences, since there 
are ( 2 ) hash pairs than include any pair of bins (determined by which of the d 
hashes each of the two bins corresponds to). 

Hence we can apply the standard Azuma-Hoeffding inequality (see, e.g., [M 
Theorem 12.4]) to obtain 

Pr(|L;,_i - Fol > A) < 


Hence 

Pr(|^(z) -E[r)(z)]| > A) < 

We now break things into cases. First, suppose z and j are such that E [viz)] > 
nO-55^ \Ye choose A = to obtain 

Pr(|r)(z) - E[r7(^)]| > n"'"") < < e-"”'". 

for sufficiently large n. Hence 

Pr(|77(z) - E[7 ?(z)]| > ({n - 1)) < 

We also note that in this case 


Pj > (1 -n °-'‘)E[j7(^)], 

SO 

p,{l + 2n-°-^)>E[viz)] 
for large enough n. It follows that 

Pr(r?(z) - (1 + l{n - 1)) < e""”'"'. 

Further, — n~^-^)/n{n — 1). Simpliyfing the above we find 

Pr(77(z) -p, > 2n-°'4p, + - 1)) < 
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which implies 


Pr(7?(z)-p, <e-" ' . 

Hence r]{z) < (1 + with very high probability over the bin state. 

Now, consider when z and j are such that E[?7 (z)] < We again choose 

A = to obtain 

Pr(|r 7 (z) — £[ 77 ( 2 :)] I > /{n — 1)) < ° . 

In this case, pj > n~^-^ and hence greater than E[» 7 (z)] for sufficiently large n. 
Hence 

Pr(r7(z)-p, >n-°'47(n-l)) 

and therfore 

Pr(77(z) — pj > n~^'°^pj) < e“" 

In both cases, we have 77 ( 2 ) < {l + n~°'^^)pj with probability at most e“"° 
a union bound gives the result. ■ 


From this point, we can return to following the LM argument. We have shown 
that the probability a bin is chosen using double hashing is at most (1 + S) 
times that of modified random hashing for S = with high probability. We 

therefore consider the following algorithm to bound the performance of throwing 
m = cn balls into n bins using double hashing. In what follows, we discuss a 
bin z that we consistently make jth in the ordering, so with modified random 
hashing the probability a ball lands in z is p_,-, and with double hashing this 
probability is r](z). 

1. We throw (1 + 26)m balls. 

2. If, at any step, we have r]{z) > (1 + S)pj for any bin z, the algorithm fails 
and we stop. Otherwise, we place balls as follows. 

3. At each step, with probability 1/(1 + 5), we place a ball according to double 
hashing. 

4. Otherwise, with probability d/{l + S), we place a ball with probability 

((1 + 6)pj — ri{z)) into bin z. 

Theorem 1. With high probability, the algorithm above places (1 + 26)m balls 
according to modified random hashing, and at least m balls are placed according 
to double hashing. The final bin state therefore dominates that of placing m balls 
using double hashing with high probability. 


Proof. A simple calculation shows that each ball lands in the jth ordered bin 
with probability 


1 

TT^ 


r]{z) 


^ ( (1 + ^)P3 

l + i5 ^ (5 



= Pj. 


So each ball is placed with the same distribution as for modified random hashing, 
as long as no bin has ri{z) > (1 + 5)pj. By Lemma 21 the probability of such a 
failure is union bounded by over the m steps of adding balls. 
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Let B be the number of balls placed by double hashing when using the above 
algorithm. Then 


E[S] = (1 + 2(5)m/(l + S)> {1 + 6/2)m. 

A simple Chernoff bound [H Exercise 4.13] gives 

Pr{B <m)< e-2™('5/2)V(l+25) < g-O.QTn 

for n sufficiently large and m = cn for a constant c. 

The extra balls placed by modified random hashing are handled via our 
domination result, Lemma |31 ■ 


The followiirg corollary is immediate from the domination. 


Corollary 1. Let d and T be constants. Suppose m = Tn balls are sequentially 
thrown into m bins according to double hashing. Then the maximum load is 
^°fo*gd” where the 0(1) term depends on T. 

This next corollary follows from the fact that the algorithm shows that, 
step by step, the double hashing process and the modified hashing process are 
governed by the same family of differential equations, as the probability of going 
into a bin of a given load differs by o(l) between the two processes. 


Corollary 2. Let i, d, and T be constants. Suppose m = Tn balls are sequen¬ 
tially thrown into n bins according to double hashing. Let Xi{T) be the number 
of bins of load at least i after the balls are thrown. Let Xi(t) he determined by 
the family of differential equations 


where x^it) 

1 - 0 ( 1 ), 


1 for all time and a:i(0) = 0 for i > 1. Then with probability 


MT) 

n 


Xi{T) + o(l). 


5 Conclusion 

We have shown that the coupling argument of Lueker and Molodowitch can, 
with some modification of the standard random hashing process, yield results for 
double hashing with the balanced allocations framework. It is worth considering 
if this approach could be generalized further to handle other processes, most 
notably cuckoo hashing and peeling processes, where double hashing similarly 
seems to have the same performance as random hashing m- The challenge here 
for cuckoo hashing appears to be that the state change on entry of a new key is 
not limited to a single location; while only one cell in the hash table obtains a 
key, other cells become potential future recipients of the key if it should move. 
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effectively changing the state of those cell. This appears to break the coupling 
method of the LM argument, which conveniently can forget the choices involved 
after an item is placed. The issue similarly arises for peeling processes, Robin 
Hood hashing, and other hashing schemes involving multiple choice. However, 
we optimistically suggest there may be some way to further modify and extend 
this type of argument to remove this problem. 
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Appendix 

We briefly sketch the proofs of the following results regarding modified random 
hashing that we discussed in section [3] 

Lemma 1. Let i, d, and T he constants. Suppose m = Tn halls are sequentially 
thrown into n bins according to the modified random hashing process. Let Xi{T) 
he the number of bins of load at least i after the balls are thrown. Let Xi(t) be 
determined by the family of differential equations 


dxi 

dt 


= X. 




- X, 


where x^if) = 1 for all time and a;i(0) = 0 for i > 1. Then with probability 

1 - 0 ( 1 ), 

^^ = x,(r) + o(i). 
n 

Proof. (Sketch.) We note that this result holds for either modified random hash¬ 
ing or random hashing. For random hashing, the result is a well known applica¬ 
tion of the fluid limit approach. Specifically, suppose we let Xi{t) be a random 
variable denoting the number of bins with load at least i after tn balls have been 
thrown, and let Xi{t) = Xiifyjn. For Xi to increase when a ball is thrown, all 
of its choices must have load at least i — 1, but not all of them can have load at 
least i. Let us first consider the case of random hashing. For i > 1, 

+ 1/n) - Xi{t)] = {xi-i{t))‘^ - ixi{t))‘‘. 

Let A{xi) = Xi{t + 1/m) — Xi{f) and A{f) = 1/n. Then the above can be written 

A{xi 


as: 


E 


A{t) 


= (x,_i(t))"-(x,(t))^ 


In the limit as m grows, we can view the limiting version of the above equation 
as 


dxi 

dt 


yd 

^i-l 


^d 


The works of Kurtz and Wormald mm justify convergence of the random 
hashing process to the solution of the differential equations. Specifically, it follows 
from Wormald’s theorem Theorem 1] that 


W(t) = nxi{f) + o{n) 
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with probability 1 — o(l), which matches the desired result. 

Now we note that for the modified hashing process, with the corresponding 
variables, we have 

E[Xi{t + 1/n) - Xi{t)] = (1 - f + Xi-i{t)n~° '^ 

Wormald’s theorem EHl Theorem 1] allows o(l) additive terms and yields the 
same result, namely Xi(t) = nxi(t) + o{n) with probability 1 — o(l). ■ 


Lemma 2. Let d and T be constants. Suppose m = Tn balls are sequentially 
thrown into m bins according to the modified random hashing process. Then the 
maximum load is " +0(1); where the 0(1) term depends on T. 

Proof. The proof is a simple modification of the layered induction proof of [J 
Theorems 3.2 and 3.7]. For convenience, we consider just the case of m = n 
to present the main idea, which corresponds to [U Theorem 3.2]. The theorem 
inductively shows that for balanced allocations with random hashing the number 
of bins with load at least i is bounded above with high probability by 


for i > 6 and i < i* for some i* < Inlnn/lnd + 0 ( 1 ), where for i* we have 
Pf,/n’^ < 2Inn. 

The same results hold with essentially the same induction when using the 
modified random hashing; however, one must stop the induction earlier. In par¬ 
ticular, the probability that a ball lands in a bin with load at least i is now 
given by (1 — -I- once we can no 

longer use the induction. Let i* < Inlnn/lnd -|- 0(1) be the point where the 
induction step no longer applies using modified random hashing. At that point 
the probability any specific bin with load at least i* obtains a ball at any time 
step is at most The probability any bin with load i* obtains three more 

balls is thus bounded above by ( 3 ) 71 ( 271 “^ "^)^ = 0 ( 7 i“°-^), so the maximum load 
is 7 * -b 3 with high probability. ■ 
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